On the Limits of Cache-Oblivious Matrix Transposition
نویسنده
چکیده
Intuitively, a cache-oblivious algorithm implements an adaptive strategy which runs efficiently on any memory hierarchy without requiring previous knowledge of the parameters of the hierarchy. For this reason, cache-obliviousness is an attractive feature of an algorithm meant for a global computing environment, where software may be run on a variety of different platforms for load management purposes. In this paper we present a negative result on cache-obliviousness, namely, we show that an optimal cache-oblivious algorithm for the fundamental primitive of matrix transposition cannot exist without the tall cache assumption, which forces the (unknown) parameters of the memory hierarchy to satisfy a certain technical relation. Our contribution specializes the result of Brodal and Fagerberg for general permutations to matrix transposition, and provides further evidence that the tall cache assumption is often necessary to attain optimality in the context of cache-oblivious algorithms.
منابع مشابه
On the limits of cache-oblivious rational permutations
Permuting a vector is a fundamental primitive which arises in many applications. In particular, rational permutations, which are defined by permutations of the bits of the binary representations of the vector indices, are widely used. Matrix transposition and bit-reversal are notable examples of rational permutations. In this paper we contribute a number of results regarding the execution of th...
متن کاملCache Oblivious Matrix Transposition: Simulation and Experiment
A cache oblivious matrix transposition algorithm is implemented and analyzed using simulation and hardware performance counters. Contrary to its name, the cache oblivious matrix transposition algorithm is found to exhibit a complex cache behavior with a cache miss ratio that is strongly dependent on the associativity of the cache. In some circumstances the cache behavior is found to be worst th...
متن کاملCache-oblivious Algorithms Cache-oblivious Algorithms Acknowledgments
This thesis presents “cache-oblivious” algorithms that use asymptotically optimal amounts of work, and move data asymptotically optimally among multiple levels of cache. An algorithm is cache oblivious if no program variables dependent on hardware configuration parameters, such as cache size and cache-line length need to be tuned to minimize the number of cache misses. We show that the ordinary...
متن کاملCache - Oblivious Algorithms by Harald Prokop
This thesis presents “cache-oblivious” algorithms that use asymptotically optimal amounts of work, and move data asymptotically optimally among multiple levels of cache. An algorithm is cache oblivious if no program variables dependent on hardware configuration parameters, such as cache size and cache-line length need to be tuned to minimize the number of cache misses. We show that the ordinary...
متن کاملPractically efficient methods for performing bit-reversed permutation in C++11 on the x86-64 architecture
The bit-reversed permutation is a famous task in signal processing and is key to efficient implementation of the fast Fourier transform. This paper presents optimized C++11 implementations of five extant methods for computing the bit-reversed permutation: Stockham auto-sort, naive bitwise swapping, swapping via a table of reversed bytes, local pairwise swapping of bits, and swapping via a cache...
متن کامل